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I. Introduction 

In engineering and biomedical sciences, parametric models are frequently used in 
analyzing survival data. This analysis is often complicated by the presence of right 
censoring. Typically right censored data arise in medical studies when patients 
cannot be followed to the event of interest. 

A common parametric method of estimation is the maximum likelihood approach 
which is efficient if the specified parametric model is valid. However, in many situa- 
tions in practice, there is no certainty that the data come from a specified parametric 
model and may, in fact, come from some neighborhood of the model. Likelihood 
based estimation procedures can lead to poor results when the underlying model 
is misspecified or contaminated. In such instances, the maximum likelihood is not 
robust against data or model inadequacies and the need for robust statistical tech- 
niques for estimation and testing has been stressed by many authors, we may refer 
to Huber (1981), Hampel et al. (1986), Maronna et al. (2006) and the references 
therein. 

In this paper, we consider parametric estimation for right censored data with and 
without contamination, and try to balance the dual aims of robustness and efficiency 
using minimum divergence estimators. 
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Keziou (2003) and Broniatowski and Keziou (2009) introduced the class of dual 
divergences estimators for general parametric models, the procedure being based 
on the optimization of a new dual form of a divergence and includes the maximum 
likelihood as a benchmark. Toma and Broniatowski (2010) have proved that this 
class contains robust and efficient estimators and proposed robust test statistics 
based on divergences estimators. 

A major advantage of the method is that it does not require additional accessories 
such as kernel density estimation or other forms of nonparametric smoothing to 
produce nonparametric density estimates of the true underlying density function. 
The plug-in of the empirical distribution function is sufficient for the purpose of 
estimating the divergence in the case of i.i.d. data. For the right- censoring sce- 
nario, one can replace the empirical distribution function with the corresponding 
estimate of the cumulative distribution function based on the Kaplan-Meier esti- 
mate Kaplan and Meier (1958). Thus in this situation one can also estimate the 
divergence measure without having to take recourse to nonparametric smoothing 
techniques in contrast with existing method, see Yang (1991), Ying (1992) that 
need a nonparametric estimate of the true density function. Another feature of the 
proposed method is it flexibility, that is it leads to a wide class of M-estimators 
indexed by the divergence function and by some instrumental value of the parame- 
ter, called here escort parameter. Relevant choices induce efficiency and robustness 
properties of the proposed estimators. 

The paper is organized as follows. In Section 2, we present the class of dual di- 
vergences estimators in the censored case. Asymptotic properties of the proposed 
estimators are derived in Section 3. We give a brief discussion on the choice of the 
escort parameter in Section 4. In Section 5, we present Monte Carlo simulation 
studies to show the performance of the proposed estimators from both robustness 
and small sample accuracy points of view. Proofs are deferred to the Appendix. 

2. Dual divergences for censored data 

The class of dual divergences estimators has been recently introduced by Keziou 
(2003), Broniatowski and Keziou (2009). In the following, we shortly recall their 
context and definition. 
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Recall that the 0-divergence between a bounded signed measure Q and a probability 
P on f^, when Q is absolutely continuous with respect to P, is defined by 



where is a convex function from ] — oo, oof to [0, oo] with 0(1) = 0. 

Well-known examples of divergences are the Kullback-Leibler, modified Kullback- 
Leibler, % 2 , modified x 2 an d Hellinger divergences, they are obtained respectively for 
4>{x) = xlogx — x + 1, 0(x) = — logx + x — 1, 4>{x) = \(x — l) 2 , 0(x) = I and 
4>(x) = 2(- v /x — l) 2 . All these divergences belong to the class of the so called "power 
divergences" introduced in Cressie and Read (1984) (see also Liese and Vajda (1987) 
chapter 2). They are defined through the class of convex functions 

x G]0, +00H Ux) := s 7 -7S + 7-l (JU) 

7(7 - 1) 

if 7 G R \ {0, 1}, (po(x) := — logx + x — 1 and 0i(x) := x logx — x + 1. (For all 
7 G R, we define 7 (O) := lim^o 7 (x)). So, the J^L-divergence is associated to 
0i, the i^L m to 0o, the x 2 to 2 , the Xm to 0_i and the Hellinger distance to 0i/ 2 . 
We refer to Liese and Vajda (1987) for an overview on the origin of the concept of 
divergences in statistics. 

Let Xi, . . . , X n be an i.i.d. sample with p.m. Pq q . Consider the problem of estimat- 
ing the population parameters of interest #o ; when the underlying identifiable model 
is given by {Pq : 9 G 6} with 9 a subset of R d . 
Let be a function of class C 2 , strictly convex and satisfies 

dP e (x)<oo. (2.2) 

By Lemma 3.2 in Broniatowski and Keziou (2006), if the function satisfies: There 
exists < 77 < 1 such that for all c in [1 — rj, 1 + 77], we can find numbers c\, c 2 , c 3 
such that 

4>{cx) < Ci0(x) + c 2 |x| + C3, for all real x, (2.3) 

then the assumption (2.2) is satisfied whenever D^Pg, P a ) is finite. From now on, U 
will be the set of 9 and a such that D^(Pg, P a ) < 00. Note that all the real convex 
functions 7 pertaining to the class of power divergences defined in (2.1) satisfy 
the condition (2.3). Take for example the exponential distribution with density 
p e {x) = 9e~ ex for x > and 9 > 0, then U := {a, 9 > : 7$ + (1 - 7)0 > 0}. 
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Under (2.2), using Fenchel duality technique, the divergence D^O, 9 ) can be repre- 
sented as resulting from an optimization procedure, this elegant result was proven in 
Keziou (2003), Liese and Vajda (2006) and Broniatowski and Keziou (2009). Broniatowski and Keziou 
(2006) called it the dual form of a divergence, due to its connection with convex 
analysis. 

Under the above conditions, the 0- divergence: 

dP« n , 



(2.4) 



D(f>{Pe, Pe () ) - 
can be represented as the following form: 

Df(Pe,P eo ) =sup / h(9,a) dP t 

a&l J 

where h(9, a) : x h-> h(9, a, x) and 

>0O ,, ( p (x) 



h(6, a, x) := I 4> 



— I dP H - 

Pc 



p a {X) 



Pa(x) 



Pe{x) 
p a (x) 



(2.5) 



According to Liese and Vajda (2006), under the strict convexity and the differentia- 
bility of the function <p, it holds 



0W>0(5) + 0'(s)(t- S ) 



(2.6) 



where the equality holds only for s = t. Now, let 9 and 9q be fixed and put t = 
Pe(x)/pe (x) and s = pe(x) /p a (x) in (2.6) and (2.4) will follow by integrating with 
respect to Pg . 

Since the supremum in (2.4) is unique and is attained in a = 9q, independently 
upon the value of 9, define the class of estimators of 9$ by 



oi(f>(9) := argsup / h(9, a)dP n , 9 G 0, 

aeU J 



(2.7) 



where h(9, a) is the function defined in (2.5). This class is called "dual (^-divergence 
estimators" (D0DE's). 

Let us now turn to the estimation using divergences in our setting. In the case of 
right censored data only 

Z = min (X, Y) and 5 = 1{x<y} 

are observable. 5 indicates whether X has been censored or not. The variables 
Xi are randomly generated from the true distribution Pg Q which is modeled by the 
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parametric family {Pg, 9 G 0}. Given a set (Zi,Si) , % — 1, . . . ,n of independent 
copies of (Z,5), it is then our goal to draw some inference on the true but unknown 
lifetime distribution Pg . 

Throughout the rest of the paper we will assume that the variable of interest X and 
the censoring variable Y are independent and G denotes the unknown distribution 
of censoring time Y. The distribution F of the observation Z = min(X, Y), satisfies 

1-F=(l-Pg )(l-G). 

Kaplan and Meier (1958) developed a nonparametric estimator for the survival func- 
tion which is is a strongly consistent estimator of the target survival function under 
appropriate conditions (see Peterson (1977), Miller (1981)) 



Pn(x) = 1 - 



i=l 



1 - 



6 



(0 



n — i + 1 



H*(i)<«} 



where [Z^, S^) , i = 1, . . . , n, are the n pairs of observations ordered over the Z^ 
and 1a denotes indicator function of A. If all d^s are equal to 1, P n reduces to the 
ordinary empirical distribution function P n . 

Thus, in the right censoring context described above, we can replace P n in (2.7) 
by P n (x) which provides a consistent estimator of the true distribution function in 
this context. Therefore, for the right censoring situation the "dual 0-divergence 
estimators" (D0DE's), is defined by replacing P n in (2.7) by P n , that is 

cty(0) := argsup / h(6,a)dP n , 9 G 0. (2.8) 

a&A J 

Following Stute (1995), the Kaplan-Meier integral / h(9, a)dP n may be written as 



J2w m h(9,a,Z 



i=l 



where for 1 < i < n 



n — i + 



i-l 

in 



n - j 
n — j + 1 



The corresponding estimating equation for the unknown parameter is then given by 



^h(9,a)dP n = 0. 



(2.9) 



6 
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Formula (2.8) defines a family of M-estimators for censored data indexed by the 
function <fi specifying the divergence and by some instrumental value of the param- 
eter 9, called here escort parameter, see also Broniatowski and Vajda (2009). The 
choices of <fi and 9 represent a major feature of the estimation procedure, since they 
induce efficiency and robustness properties. 
An M-estimator of ^-type is the solution of the vector equation: 



where the elements of ip(x; a) represent the partial derivatives of h(9,a,x) with 
respect to the components of a. 

The first extension of M-estimators to censored data was noted in Reid (1981), she 
derived the influence function and then the asymptotic normality. Oakes (1986) 
considered M-estimators (2.10) with ijj(x;9) = — logf(x;9) and called them ap- 
proximate MLEs (hereafter AMLE). Wang (1995) studied the strong consistency of 
M-estimators using the law of large numbers of the Kaplan- Meier integral developed 
by Stute and Wang (1993) and Stute (1995). Wang (1999) extended asymptotic re- 
sults for M-estimators to the censored case. 

The Hellinger distance have been used by Yang (1991) and Ying (1992). Estimation 
under misspecification have been considered by Suzukawa et al. (2001). Basu et al. 
(2006) developed a robust estimation, adapting the robust density power divergence 
methodology of Basu et al. (1998). 



In this section, we establish the consistency and asymptotic normality of the class 
of dual divergences estimators in the right censored situation. 
For a distribution P, let r P = sup {x : P(x) < 1} denote the upper bound of the 
support of P. 

Assume that 9$ is an interior point of 0, the convex function <p has continuous 
derivatives up to 4th order and the density p a (x) has continuous partial derivatives 
up to 3th order (for all x A — a.e). Hereafter, p a will denotes the derivative with 
respect to a of p„, || • || the Euclidean norm, and, for a real valued function g, its 




(2.10) 



3. Asymptotic properties 
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total variation or variation norm is denned as 

JV+l 

IMIv = SU P Yl \ g ^ x ^ ~ 9( x j-i)\ 7 

5=1 

where the supremum is taken over all N and over all choices of {xj} such that 

— OO = Xo < X\ < . . . < Xn < Xn+1 = +oo. 

Let S be the d x d matrix with entries 

d 2 

We precise some notations for the asymptotic results in this section. The following 
quantities have been introduced in Stute (1995a) and Wang (1999). 
Denote m(y) = p(S = 1\Y = y), decompose F into two subdistributions F , F t , 
such that F — F + Fx, where 

F (y) = P(Y<y,5 = 0)= f (1 - m(t))dF(t) = f (1 - P 6o (t))dG(t), 

J —OO J — OO 

Fi(y) = P(Y<y,6=l)= f m(t)dF(t) = f (1 - G{t-)) dP 0o , 

J — OO J —OO 

and their empirical counterparts 

1 " 

F M = ~Y1 l {Zi<vA=3h 3 = 0, 1. 

Define 



n . 



l {y<x} dF (y) 

£ {x) = exp <J / - _ } , (3.1) 



and, for i — 1, . . . , d, 



l {x<y} -^-h(6, a, yMy)dFy(y) : (3.2) 
—h(9, a, z)i {z)C{x A z)dF 1 (z), (3.3) 



where 

= f W%) = f l {v<x} dG(y) 

[l-F(y)} 2 J [1-Pe (y)}[l-G(y)} 2 
Let U{a) = (Ui, . . . , UdY denote the random variable defined as: 



Ui(a) = ^h(9, a, Y)£ (Y)5 + ^(Y)(l -5)- £ 2i (Y), i = l...,d. (3.5) 
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When a = 6 , 

Ui{9 ) = ^U(0, 9 , Y)£ {Y)6 + ^(YXl - 5) - &(Y), i = l...,d. 
Denote V the d x d matrix 

V — E (U(9 )U(9 ) T ) . (3.6) 

3.1. Consistency. In Theorem 1 below, we prove that a^(9) exist and are consis- 
tent. We will consider the following conditions. 

(R.O) Tp go < tq, where equality may hold except when G is continuous at Tp fl() , 
and, the probability mass of Pg at Tp 6q \ Pg (j~P e ^ ~ Pe a \JP eQ j > Oj 

(R.l) There exists a neighborhood N(9q) of 9q such that the first and second order 

partial derivatives (w.r.t a) of <fi' (pg(x) / p a (x)) pg(x) are dominated on N(9q) 

by some integrable functions. The third order partial derivatives (w.r.t a) of 

h(9,a,x) are dominated on N(9q) by some P6» "i n tegrable functions and the 

matrices S and V are non singular; 
d 

< oo. 



(R.2) 

These conditions are mild and can be satisfied in most of circumstances. The con- 
dition (R.O) ensures that X is observable on the hole of the support of Pg . Note 
that if Tp Bo > tq holds, the Xi in [tq, oo) is certainly censored. In a large number 
of practical situations, tp Bq = tq = oo, hence the condition (R.O) is satisfied. 
Condition (R.l) is about usual regularity properties of the underlying model, it 
guarantees that we can interchange integration and differentiation and the existence 
of the variance-covariance matrices, it is similar to regularity conditions used in 
Keziou (2003) and Broniatowski and Keziou (2009) in the uncensored case. 

Condition (R.2) is needed to apply the L.I.L in the proof of Theorem 1. The re- 

d 

quirement that ip(x;a) := —h(9,a) be of bounded variation is standard in Ad- 
da 

estimation, see for instance Welsh (1989). Keep in mind the assumed regularity 
conditions on the criterion function, that is, h(9, a) in the present framework, to see 
that it holds for most regular models. 

It is also noted that conditions (R.l) and (R.2) are independent of G. 

Theorem 1. Let B(9 , n" 1/3 ) := {9 e 0, \\9 - 9 \\ < n' 1/3 }. Assume that condi- 
tions (R.0-2) hold, then as n tends to infinity, with probability one, the function 
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a H- J h(9, a) dP n attains its local maximum at some point a. $(9) in the interior of 
B(9 ,n~ 1 ^ 3 ), which implies that the estimate a ^{6) is consistent and satisfies 

iU(0,a (0)) d p n = o. 

The proof of Theorem 1 is postponed to the Appendix. 

In practice, to obtain the estimate <5</,(0), we use gradient descent algorithms in the 
optimization in (2.9). These algorithms depend on some initial parameter value of a. 
Hence, it is desirable to prove that in a neighborhood of 9q there exists a maximum 
of / h(9, a) dP n which does indeed converge to Qq. Note that the initial parameter 



value may provide a local maximum (not necessarily global) of / h(9,a) dP„. The 



concave and 9 is convex, see for instance Broniatowski and Keziou (2009, Remark 
3.5). 

The aim of Theorem 1 is not to establish the optimal rate of the estimate but 
merely the existence and the consistency (a.s.) of the estimate. We have considered 
n~ 1//3 because it works well, indeed, in Taylor expansion (A. 3), in the proof, the 
third term of the right hand side is 0(1) only for this rate, which is the major key 
of the demonstration, for similar arguments in the estimation of copula models see 
Bouzebda and Keziou (2010). 



3.2. Asymptotic normality. In Theorem 2 below, we give the limit law of the 

estimates a $(9) under the 
convergence in distribution. 



estimates a ^(9) under the following conditions. From now on, — — >■ denotes the 



(R.3) For all 1 < i < d, E 
(R.4) For all 1 < i < d, J 



a \ 2 

-h(9,a,Y)Co(Y)S 



da 
d 



< oo; 



h(6, a, x) 



C l/2 (x)dP 6o < oo. 



doii 

Conditions (R.3-4) are essential for the asymptotic results of M-estimators in the 
censored case, see for instance Wang (1999) and Basu et at (2006) in the case of 
density power divergence method. 
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Theorem 2. Assume that assumptions (R.0-4) hold. Then, as n — > oo 

(a^6) - O ) N (0, S~ VS" 1 ) 
The proof of Theorem 2 is postponed to the Appendix. 

4. Adaptive choice of the escort parameter 

Analogously as in the uncensored case, the very peculiar choice of the escort pa- 
rameter defined through 9 = 9 has same limit properties as the AMLE. The D0DE 
(Xcp (Oo), in this case, has variance which indeed coincides with the AMLE for cen- 
sored data. If 9 is a real parameter, the asymptotic distribution of y/n (a^(9) — 9 ) 
is normal with mean zero and variance 

PlW dx [ _W dx 
p 0o (x)G(x) J P 9o (x)G 2 (x) 

where pg is the derivative with respect to 9 of and Ig is the Fisher information 
matrix 

U, : = / dA. 

J Pe 

Observe that if there is no censorship, that is G = 0, the variance of (9 ) is — . 

This result is of some relevance, since it leaves open the choice of the divergence, 
while keeping good asymptotic properties. 

In practice, the consequence is that the escort parameter should be chosen as a 
the AML estimator of 9q, say 6 n , which under the model is a consistent estimate of 
9 . In turn we may expect that the resulting estimator \ 9n) inherits both good 
asymptotic properties under the model, and, under contamination through a tuning 
of the divergence index 7. 

Consider the power divergences family Cressie and Read (1984), the estimating 
equation (2.9) reduces to 

_ r h&L y- 1 ?M (x) dx+-j2 w„ (2§an 7 = 0, (4.2) 

J \P a {x)J p a (x) \p a {Z {i) )J p a {Z {t) ) 

where Wi n are the Kaplan- Meier weights. The estimate 3^(0) is the solution in a 
of (4.2). 
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An improvement of the present estimate results in the plugging of a preliminary 
consistent estimate of 6q, say 8 n , as an adaptive escort parameter 6 choice. 




Pe \ x ) 

FIGURE 1. Behaviour of the ratio — —. under conatmination, for 

Pe {x) 

a randomly generated exponential sample exp(l) of size 100 with 
exp(l/9) as censoring distribution and 20% of contamination by 
exp(O.l). 

Let x be some outlier, the role of the outlier x in (4.2) appears in the term 



7 



Pa{X) 



Pa{X) J p a {X) 



(4.3) 



The estimate a^{6) is robust if this term is stable. That is, if it is small when a is 

~ Pe ( x ) 

near 6q. If the escort parameter 6 n is not a robust estimator, the ratio — — can be 

Pe {x) 

very large, see Figure 1. This is due to the fact that the outlier x will be more likely 
under , that is 6 n will lead to an over evaluation of p-g (x) with respect to the 
expected value under Q , say pe (x). To guard against such situations, compensate 
through the choice of 7, this requires further investigation. 

One proposal for the choice of the divergence, is to look for values of the tuning pa- 
rameter 7 to obtain a bounded influence function in the spirit of Toma and Broniatowski 
(2010), we leave this issue open for future research. 
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We now prove that the subsequent estimator a^ \ ®nj enjoys a limit normal law 
under the model, see Theorem 3 below. 

Recall that, when 9 = 9 , S = —<f)"(l)Ig . Also, when a = 9 = 9q, we have 



U = «f (1)™£ (Y)5 + £x(y)(l - 5) - 6(F), 
Pe 

and the matrix V defined in (3.6) is 

V = E (UU T ) , 

(R.5) For all 1 < i,j < d, any one of the following conditions holds: 
d 2 

fi) 9 i — y — — - — h(9, 9q, x) is continuous at 9q uniformly in x; 



(4.4) 



dctidotj 



d 2 



daidctj 



h(9,9 ) 



d 2 



daidaj 



h(9 ,9 



o, 



dP 6o = e p ^0, 



sup 

{e-.\e-e \< P } 
as p — t- 0. 
d 2 

[m] x i — y — — - — h(9,9o,x) is continuous in x for 9 in a neighborhood of 9$ 



and 



daidatj 



lim 

e-+e 



d 2 d 2 

h(9,9 ,-)-——h(9 ,9 , 



dctidaj 

|2 



daidaj 



0; 



f d 2 

iv) # h-> / — — - — h{9, 9 )dPg is continuous at 9 = 9 , and 
d 2 

x i — y — — - — h(9,9o,x) is continuous in x for 9 in a neighborhood of 9q 



daidaj 



and lim 



«9 2 «9 2 

h(e t e 0t -)- ir -z-h(eo,e , 



daidaj 

|2 



daidaj 



< oo; 



/• <9 2 

v) 6* i— j- / — — - — /i(#, 6 l )dPe is continuous at 9 = 9 , and 
J daidaj 



h(9,9 )dP n ^, 



daidaj 



d 2 



daidaj 



h(9, 9 )dPg Q < oo, 



uniformly for 9 in a neighborhood of #o- 
Condition (R.5) is related to Lemma 1 in Wang (1999) and ensures the convergence 

d 2 



d 2 - p 

-h(9 n ,9 )dP n -^, 



daidaj 



daidaj 



h(9 ,9 ) dP eo , l<ij<d, 



provided that 
(R.0) holds. 



d 2 



daidaj 



h(9 , 9 ] 



dPg < oo, 1 < i, j < a, 9 n — > 9 and condition 
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Theorem 3. Assume that assumptions (R.0-5) hold. Then, as n — > oo 

v^ (e n ) - 0„) A at (o, 0"- 2 (i)/ - V/,- 1 ) , 

where is defined in (4.4). 

The proof of Theorem 3 is postponed to the Appendix. 

5. Simulation 

In this section, we present results of a simulation study which was conducted 
to explore the properties of newly proposed dual 0-divergence estimators (D0DE). 
These estimators are also compared with some other methods, including maximum 
likelihood estimator (MLE), approximate maximum likelihood estimator (AMLE) 
and estimators based on density power divergence method (MDPDE). 
Following Stute (1995), the Kaplan-Meier integral J h(6,a)dP n may be written as 

n 

J2W in h(9,a,Z ({) ) 

i=l 

where for 1 < i < n 

n-i + l 1 - 1 - 

3=1 



n - j 
n — j + 1 



Figure 2 presents the Kaplan-Meier estimator of the survival function for a ran- 
domly generated exponential sample exp(l) of size 100 with exp(l/9) as censoring 
distribution. 

In this simulation study we will use the power divergences family Cressie and Read 
(1984). In this case 



h(6, a)dP n 



7 



^y-'dp.-i 

PaJ 7 



7 

— I 

Pa 



dP n 



7-1 



Consider the lifetime distribution to be the one parameter exponential exp (8) with 
density pe(x) = 9e~ 6x , x > 0. The MLE of 9 is given by 



e 



n,MLE 



(5.1) 
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Survival function (Kaplan-Meier estimator) 




1 1 r 

12 3 



Figure 2. Kaplan- Meier estimator of survival func- 
tion with confidence intervals. 



and the AMLE of Oakes (1986) is defined by 



Or, 



AMLE 



It follows that for 7 e R \ {0, 1} 
1 



7 - 1 J \P, 



7-1 



( 7 -l)[ 7 + (l- 7 )a] : 



and 



h{6,a)dP n 



( 7 -l)[ 7 + (l- 7 ) a ] 



-it W *\(-) exp {- 7 (0 -«)%}-! 



For 7 = 0, 



/n 
h(9,a)dP n = Yl 



a)Z(i) - log I - 



(5.2) 



Observe that this divergence leads to the AMLE, independently upon the value of 
6. 
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For 7 = 1, 

-exp (-(9 - a)Zu)) - 1 . 
a 

To make some comparisons, beside dual 0-divergences estimators, we considered 
minimum density power divergence estimators of Basu et al. (2006), (MDPDE's), 
recall that the density power divergence between g and another density / is 

dp(g,f) = J + g(z)f (z) + ±g^(z)} dz for P>0. 

The values of 7 are chosen to be —1, 0, 0.5, 1, 2 which corresponds to the well 
known standard divergences: Xm~ divergence, KL m , the Hellinger distance, KL and 
the x 2 — divergence respectively. For the MDPDE's we take the following values of 
P : 0.1, 0.5, 1. 

A sample is generated from exp(l) and 0, 10, 25 of the observations are contami- 
nated by exp (5) successively. We have used an exponential censoring scheme, the 
censoring distribution is taken to be exp(l/9), that the proportion of censoring 
is 10% . The D0DE's a^{9) are calculated for samples of sizes 25, 50, 75, 100 
and the hole procedure is repeated 1000 times. The value of escort parameter 9 
is taken to be the AMLE. We carried out Kaplan-Meier analysis with the Sur- 
vival package Therneau and original R port by Thomas Lumley (2009) within the 
R Language R Development Core Team (2009). 

Tables 1 and 2 provide the MSE of various estimates under the model, according to 
an an increasing proportion of censoring. As expected, when there is no contami- 
nation, MLE produces most efficient estimators. A close look at the results of the 
simulations show that the D^DE's performs well under the model, when no outliers 
are generated. For small sample size n = 25 and n = 50, the performance of the 
estimator under the model is comparable to that of MDPDE's. Indeed in terms of 
empirical MSE the D0DE's with 7 = — 1 produces a lower MSE than the MDPDE's 
for all considered values of (3. As n grows up, the MDPDE's prevail. 
Thus, the D^DE's are shown to be an attractive alternative to both the AMLE and 
MDPDE's in these settings. 



|w,«)d?. = iog(f)-^-5>. 
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Table 1. MSE of the estimates with 10% of censoring 



n 



25 


50 


75 


100 


150 


200 


0.0572 


0.0250 


0.0157 


0.0122 


0.0079 


0.0058 



7 



-1 


0.0517 


0.0335 


0.0188 


0.0178 


0.0100 


0.0090 





0.0685 


0.0281 


0.0166 


0.0135 


0.0084 


0.0062 


0.5 


0.0727 


0.0287 


0.0168 


0.0138 


0.0085 


0.0063 


1 


0.0824 


0.0302 


0.0174 


0.0143 


0.0086 


0.0063 


2 


0.2533 


0.1156 


0.0597 


0.0436 


0.0151 


0.0084 



0.1 0.0643 0.0272 0.0162 0.0131 0.0083 0.0061 
0.5 0.0772 0.0368 0.0209 0.0173 0.0112 0.0083 
1 0.1042 0.0506 0.0279 0.0232 0.0154 0.0108 

Table 2. MSE of the estimates with 20% of censoring 



n 



25 


50 


75 


100 


150 


200 


0.0627 


0.0280 


0.0174 


0.0134 


0.0088 


0.0068 



7 



-1 


0.0655 


0.0395 


0.0262 


0.0195 


0.0154 


0.0138 





0.0892 


0.0395 


0.0248 


0.0172 


0.0113 


0.0083 


0.5 


0.0991 


0.0440 


0.0273 


0.0184 


0.0119 


0.0087 


1 


0.1268 


0.0541 


0.0336 


0.0213 


0.0131 


0.0094 


2 


0.3703 


0.2233 


0.1919 


0.1391 


0.0689 


0.0510 



0.1 0.0816 0.0362 0.0224 0.0155 0.0102 0.0075 
0.5 0.0919 0.0420 0.0247 0.0171 0.0119 0.0085 
1 0.1166 0.0559 0.0318 0.0218 0.0162 0.0110 



We now turn to the comparison of these various estimators under contamination. 
The D0DE's yield clearly the most robust estimate and outperform the MLE sub- 
stantially. We can see from Tables 3 and 4 that the D0DE with 7 = — 1 has the 
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Table 3. MSE of the estimates with 20% of contamination-10% of censoring 



n 



25 


50 


75 


100 


150 


200 


0.2413 


0.1354 


0.0975 


0.0916 


0.0798 


0.0771 



7 



-1 


0.0576 


0.0617 


0.0620 


0.0626 


0.0605 


0.0627 





0.0852 


0.0J 


512 


0.0709 


0.0710 


0.0666 


0.0674 


0.5 


0.0860 


O.Oi 


520 


0.0717 


0.0718 


0.0676 


0.0683 


1 


0.0872 


0.01 


526 


0.0723 


0.0724 


0.0682 


0.0689 


2 


0.0939 


0.0J 


543 


0.0738 


0.0735 


0.0692 


0.0697 



p 



0.1 0.0904 0.0905 0.0829 0.0835 0.0834 0.0854 
0.5 0.1134 0.1237 0.1243 0.1269 0.1369 0.1405 
1 0.1231 0.1372 0.1424 0.1449 0.1524 0.1547 



Table 4. MSE of the estimates with 20% of contamination-20% of censoring 



n 



25 


50 


75 


100 


150 


200 


0.2785 


0.1629 


0.1165 


0.1081 


0.0962 


0.0926 



7 

-1 0.0624 

0.0943 
0.5 0.0957 

1 0.0975 

2 0.1076 

P 

0.1 0.0963 0.0967 0.0891 0.0884 0.0881 0.0900 

0.5 0.1127 0.1235 0.1226 0.1241 0.1335 0.1369 

1 0.1225 0.1348 0.1391 0.1409 0.1503 0.1523 



0.0661 

0.0898 
0.0914 
0.0928 
0.0971 



0.0674 

0.0811 
0.0826 
0.0840 
0.0872 



0.0684 

0.0796 
0.0809 
0.0820 
0.0845 



0.0670 

0.0751 
0.0768 
0.0781 
0.0801 



0.0689 

0.0758 
0.0774 
0.0784 
0.0801 



smallest MSE over all other D</>DE's and the MDPDE's for all considered values of 
0. As n increases all the D0DE's compare favorably with MDPE for all /?. 



18 



MOHAMED CHERFI 



In the case of long-tailed contamination in the form of an exp(O.l) distribution, 
simulations results (not reported in this paper) emphasise that the MDPDE's are 
more robust than our proposed estimators. 

In conclusion, without contamination the D0DE's express a good small sample size 
performance which is comparable to the AMLE and MDPDE's. For medium and 
large sample sizes the MDPDE's are preferable. Under main body contamination, 
the D0DE's are more powerful. 



We have introduced a new estimation procedure in parametric models in the case 
of right censored data. The method is based on the dual representation of 0- 
divergences. The estimators are easily computed and exhibit appropriate asymptotic 
behaviour. 

We have presented an adaptive choice of the escort parameter 9 that leads to 
efficient and robust estimates. It will be interesting to investigate theoretically the 
problem of the choice of the divergence which leads to an "optimal" estimate in terms 
of efficiency and robustness. One approach is to minimize an estimated asymptotic 
mean squared error of the estimator when it is mathematically tractable, which is 
not an easy task in the context of censored data and lays beyond the scope of the 
present work. 



A.l. Proof of Theorem 1. Under the assumptions (R.O), (R.l) and by apply- 
ing the Strong Law of Large Numbers (SLLN) for censored data, see for instance 
Stute and Wang (1993), Stute (1995) and Proposition 1 in Wang (1999), we can see 
that 



6. Concluding remarks 



Appendix A. Proofs 




(A.l) 



and 



/ 



dada T 



h(e,e ) dP r 



I 



dada T 



h(e,e ) dP eo 



S < 



(A.2) 
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Now, for any a = 9 + tm -1 / 3 , with < 1, consider a Taylor expansion of 
h(9, a) dP n in a in a neighborhood of 9 . Using (R.l), one finds 



n / h(9, a) dP n - n / h(9, 9 ) dP n 



I ^h(9,9 ) dP n 



(A.3) 



-n 



1/3 



U 



O 2 



uniformly in u with ||u|| < 1. Observe that 
d 



da 



h(9,9 )d[P n -P t 



dada T 
d 



h{9,9 ) dP n + 0(l), 



da 



< sup 



P n (x) - P eo {x 



h(9,9 ) 



d_ 

da 



h(9,9 



On the other hand, under condition (R.2), by the LIL of Foldes and Rejto (1981), 
we have 

J t^(Mo) dP n = O (n~ 1 / 2 (loglogn) 1/2 ) . 
Therefore, using (A.l) and (A. 2), we obtain for any a = 9 + un -1 / 3 , with ||w|| = 1, 
n J h(6,a) dP n -nJ h(9,9 ) dP n = O (n 1 ^ (loglogn) 1/2 ) - ^ 3 S + 0(1), 

Observe that the right-hand side vanishes when a = 9q, and that the left-hand side, 
by (A. 2), becomes negative for all n sufficiently large. Thus, by the continuity of 
a i — y J h(9, a) dP n , it holds that as n — > oo, with probability one, 

a i-)- J h(9, a) dP n 

reaches its maximum value at some point a<f>(9) in the interior of -8(6*0, vr 1 '^). There- 
fore, the estimate 0:^(9) satisfies 

/ j^'"^)) d ^ n = and " d °W = °( n ~ 1/3 )- 



A. 2. Proof of Theorem 2. Using (R.l), simple calculus give 



P eo -^h(9,a) = 



and 



P 



2 



9 



da da 



-h(0,0 ) 



-s. 



(A.4) 
(A.5) 
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Observe that the matrix S is symmetric and positive since the second derivative 0" 

- d 

is nonnegative by the convexity of 0. Let U n (9o) := P n —h(9, 9 ), and use (A. 4) and 

da 

(R.O), (R.3) and (R.4) in connection with the Central Limit Theorem for censored 
data (CLT), see for instance State (1995a), Wang (1999) to see that 

VEU n (6 ) -+Af(0,V). (A.6) 

- d 2 

Also, let S n (9o) := P n — — —=h(9,9o), and use (A. 5) and (R.O) in connection with 

dad a 1 

the SLLN to conclude that 

Sn(9 ) S (a.s). (A.7) 

~ d ^ - d . 

Using the fact that P n —h(9, a</>(0)) = and a Taylor expansion of P n —h(9, «</,(#)) 

in a<p(9) around Oq, we obtain 

= Pn-^h(9, a^{9)) = P n ^h(9, 9 )+{a (f> {9) - 9 ) T P n -^- f h{9, 9 )+o P ( — 



Hence, 

Vn(S*(0) -9 ) = -S n (9o)- 1 V^U n (9o) + o P (l). (A.S) 
Using (A.6) and (A.7) and Slutsky Theorem, we conclude then 

^(a <t> (9)-9 )^M(0,S- 1 VS- 1 ) (A.9) 

- d - _ - - 
A. 3. Proof of Theorem 3. By a Taylor expansion of P n ——h(9 n , dtA6 n )) in aA9 n ) 

da 

around 9 , we obtain 

= P n —h{9 n ^{9)) = P n —h{9 n ,9 )+{a^9 n )-9^ 

~ d 2 ( 1 

Pn dagaT h(9 n ,9 Q ) + op 



Taylor expansions of P n ——h(9 n , aJ9 n )) and P n —h(9 n , 9 ) in n around O > an d the 



n-consistency of n to 9 yield 

= P,^h(B a ,a t (9)) = P„—h(0„,0 o )+ (s t (S n )-9„ S 



Let U n := P n —h(9o,9 ) and 5 n := P n —h(9 ,9 ). By the CLT 



V^7„^A/-(0,n (A.10) 
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where V is defined in (4.4). 

d 

Use condition (R.5) and the fact that S = —Pg n ———^ F h(9 , 9 ) = —(f)"(l)I eo , in con- 

oaoa 1 

nection with Lemma 1 in Wang (1999) to conclude that 

S n A 0"(1)V (A.ll) 

The theorem now follows from (A. 10), (A.ll) and Slutsky's theorem. This concludes 
the proof. 
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